Sentence alignment in bilingual corpora based on crosslingual querying
نویسندگان
چکیده
The effectiveness of translation memory for computer-aided translation depends on the results of previous sentence alignment. This paper describes a new approach to sentence alignment, based on a crosslingual querying using the technology of an existing product, SPIRIT (Syntactic and Probabilistic Indexing and Retrieval of Information in Texts). Sentence alignment and crosslingual querying based on bilingual reformulation are similar problems: both are based on a semantic proximity between two texts in different languages; both aim to find the sentences that contain most of the information demanded by the query. However, sentence alignment requires the irrelevant part of a sentence to be as short as possible. Crosslingual querying provides sentence alignment with candidates. ARCADE evaluation has shown that this approach is very robust in the cases of inverted sentence order and missing segments .
منابع مشابه
Sentence Alignment in Parallel, Comparable, and Quasi-comparable Corpora
We explore the usability of different bilingual corpora for the purpose of multilingual and cross-lingual natural language processing. The usability of bilingual corpus is evaluated by the lexical alignment score calculated for the bi-lexicon pair distributed in the aligned bilingual sentence pairs. We compare and contrast a number of bilingual corpora, ranging from parallel, to comparable, and...
متن کاملBilingual Lexicon Construction Using Large Corpora
This paper introduces a method for learning bilingual term and sentence level alignments for the purpose of building bilingual lexicons. Combining statistical techniques with linguistic knowledge, a general algorithm is developed for learning term and sentence alignments from large bilingual corpora with high accuracy. This is achieved through the use of ltered linguistic feedback between term ...
متن کاملSentence Alignment of Historical Classics based on Mode Prediction and Term Translation Pairs
Parallel corpora are essential resources for the construction of bilingual term dictionary of historical classics. To obtain large-scale parallel corpora, this paper proposes a sentence alignment method based on mode prediction and term translation pairs. On one hand, the method rebuilds the sentence alignment process according to characteristics of the translation of historical classics, and a...
متن کاملDealing with Out-Of-Vocabulary Problem in Sentence Alignment Using Word Similarity
Sentence alignment plays an essential role in building bilingual corpora which are valuable resources for many applications like statistical machine translation. In various approaches of sentence alignment, length-and-word-based methods which are based on sentence length and word correspondences have been shown to be the most effective. Nevertheless a drawback of using bilingual dictionaries tr...
متن کاملTibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
Sentence-level aligning bilingual parallel corpus is shown significant and indispensable status in machine translation, translation knowledge acquiring and bilingual lexicography research fields, which is the fundamental work for natural language processing. Given the great deal of work in sentence alignment and a variety of methods have developed for bilingual terminology extraction, those are...
متن کامل